Creating a Large-Scale Arabic to French Statistical MachineTranslation System
نویسندگان
چکیده
In this work, the creation of a large-scale Arabic to French statistical machine translation system is presented. We introduce all necessary steps from corpus aquisition, preprocessing the data to training and optimizing the system and eventual evaluation. Since no corpora existed previously, we collected large amounts of data from the web. Arabic word segmentation was crucial to reduce the overall number of unknown words. We describe the phrase-based SMT system used for training and generation of the translation hypotheses. Results on the second CESTA evaluation campaign are reported. The setting was in the medical domain. The prototype reaches a favorable BLEU score of 40.8%.
منابع مشابه
A Multi-Genre SMT System for Arabic to French
This work presents improvements of a large-scale Arabic to French statistical machine translation system over a period of three years. The development includes better preprocessing, more training data, additional genre-specific tuning for different domains, namely newswire text and broadcast news transcripts, and improved domain-dependent language models. Starting with an early prototype in 200...
متن کاملTranslation Model Adaptation for an Arabic/French News Translation System by Lightly-Supervised Training
Most of the existing, easily available parallel texts to train a statistical machine translation system are from international organizations that use a particular jargon. In this paper, we consider the automatic adaptation of such a translation model to the news domain. The initial system was trained on more than 200M words of UN bitexts. We then explore large amounts of in-domain monolingual t...
متن کاملThe need to create a media block for the convergence of overseas news networks
As a general diplomacy arm of the Islamic Republic of Iran, VoSiMa has extensive activities in international broadcasting of its radio and television programs. These programs are broadcast in different languages, such as English, French, Azeri, Arabic, and ... for regional and transnational audiences. The large volume of the organization's international activities is in the form of news and new...
متن کاملLarge and Diverse Language Models for Statistical Machine Translation
This paper presents methods to combine large language models trained from diverse text sources and applies them to a state-ofart French–English and Arabic–English machine translation system. We show gains of over 2 BLEU points over a strong baseline by using continuous space language models in re-ranking.
متن کاملA Human-Aided Machine Translation System for Japanese-English Patent Translation
The approach presented here enables Japanese users with no knowledge of English or legal English to generate patent claims in English from a Japanese-only interface. It exploits the highly determined structure of patent claims and merges Natural Language Generation (NLG) and Machine Translation (MT) techniques and resources as realized in the AutoPat and PC-Transfer applications. Due to its tun...
متن کامل